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MEASUREMENTS 
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Abstract. Consider the problem of recovering an unknown signal from undersampled measurements, 
given the knowledge that the signal has a sparse representation in a specified dictionary D. This problem 
is now understood to be well-posed and efficiently solvable under suitable assumptions on the measure¬ 
ments and dictionary, if the number of measurements scales roughly with the sparsity level. One sufficient 
condition for such is the D-restricted isometry property (D-RIP), which asks that the sampling matrix ap¬ 
proximately preserve the norm of all signals which are sufficiently sparse in D. While many classes of ran¬ 
dom matrices are known to satisfy such conditions, such matrices are not representative of the structural 
constraints imposed by practical sensing systems. We close this gap in the theory by demonstrating that 
one can subsample a fixed orthogonal matrix in such a way that the D-RIP will hold, provided this basis is 
sufficiently incoherent with the sparsifying dictionary D. We also extend this analysis to allow for weighted 
sparse expansions. Consequently, we arrive at compressive sensing recovery guarantees for structured mea¬ 
surements and redundant dictionaries, opening the door to a wide array of practical applications. 


1. Introduction 

1.1. Compressive Sensing. The compressive sensing paradigm, as first introduced by Candes and Tao 
ICT05bl and Donoho |Pon06l , is based on using available degrees of freedom in a sensing mechanism to 
tune the measurement system so as to allow for efficient recovery of a particular type of signal or image of 
interest from as few measurements as possible. A model assumption that allows for such signal recovery 
is sparsity: the signal can be well-approximated by just a few elements of a given representation system. 

Often, a near-optimal strategy is to choose the measurements completely at random, for example 
following a Gaussian distribution I RV08I . Typically, however, some additional structure is imposed by 
the application at hand, and randomness can only be infused in the remaining degrees of freedom. For 
example, magnetic resonance imaging (MRI) is known to be well modeled by inner products with Fourier 
basis vectors. This structure cannot be changed, and the only aspect that can be decided at random is 
which Fourier basis vectors to select. 

An important difference between completely random measurement systems and structured random 
measurement systems is in the aspect of universality. Gaussian measurement systems, among many 
other systems with minimal imposed structure, are oblivious to the basis in which the underlying signal 
is sparse, and achieve equal reconstruction quality for all different orthonormal basis representations. 
For structured measurement systems, this is, in general, no longer the case. When the measurements are 
uniformly subsampled from an orthonormal basis such as the Fourier basis, one requires, for example, 
that the measurement basis and the sparsity basis are incoherent ; the inner products between vectors 
from the two bases are small. If the two bases are not incoherent, more refined concepts of incoherence 
are required IKW1411AHPR131 . An important example is that of the Fourier measurement basis and a 
wavelet sparsity basis. Since both contain the constant vector, they are maximally coherent. 

1.2. Motivation. All of the above mentioned works, however, exclusively cover the case of sparsity in 
orthonormal basis representations. On the other hand, there are a vast number of applications in which 
sparsity is expressed not in terms of a basis but in terms of a redundant, often highly overcomplete, 
dictionary. Specifically, if / e C” is the signal of interest to be recovered, then one expresses / = Dx, 
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where D e C” xJV is an overcomplete dictionary and x e C N is a sparse (or nearly sparse) coefficient vector. 
Redundancy is widespread in practice, either because no sparsifying orthonormal basis exists for the 
signal class of interest, or because the redundancy itself is useful and allows for a significantly larger, 
richer class of signals which can be sparsely represented in the resulting dictionary. For example, it has 
been well documented that overcompleteness is the key to a drastic reduction in artifacts and recovery 
error in the denoising framework ISED04llSFM07l . 

In the compressive sensing framework, results using overcomplete dictionaries are motivated by the 
broad array of tight (and often Parseval) frames appearing in practical applications. For example, if one 
assumes sparsity with respect to the Discrete Fourier Transform (DFT), this is implicitly assuming that 
the signal is well-represented by frequencies lying along the lattice of the DFT. To allow for more flexibility 
in this rigid assumption, one instead may employ the oversampled DFT frame, containing frequencies 
on a much finer grid, or even over intervals of varying widths. Gabor frames are used in imaging as well as 
radar and sonar applications, which are often highly redundant !Mal99l . Many of the most widely-used 
frames in imaging applications such as undecimated wavelet frames I Dut89[ ISED04I , curvelets ICD04I , 
shearlets [ IJ.KWOS . ETTOHI , framelets i CCS08l , and many others, are overcomplete with highly correlated 
columns. 

1.3. Related Work. Due to the abundance of relevant applications, a number of works have studied 
compressive sensing for overcomplete frames. The first work on this topic aimed to recover the coef¬ 
ficient vector x directly, and thus required strong incoherence assumptions on the dictionary D IRSV08I . 
More recently, it was noted that if one instead aims to recover / rather than x, recovery guarantees can be 
obtained under weaker assumptions. Namely, one only needs that the measurement matrix A respects 
the norms of signals which are sparse in the dictionary D. To quantify this, Candes et al. ICENR10 define 
the D-restricted isometry property (D-RIP in short, see Definition l2.1l below). For measurement matrices 
that have this property, a number of algorithms have been shown to guarantee recovery under certain as¬ 
sumptions. Optimization approaches such as £\ -analysis IEMR071ICENR101 lGNE + 14lINDEG131ILML121 
IRK13llGirl4l and greedy approaches IDNW121lGNE + 14llGN13llPE13IIGN14l have been studied. 

This paper establishes the D-RIP for structured random measurements formed by subsampling or¬ 
thonormal bases, allowing for these types of recovery results to be utilized in more realistic settings. To 
date, most random matrix constructions known to yield D-RIP matrices with high probability are ran¬ 
dom matrices with a certain concentration property. As shown in I CENRlOj , such a property implies that 
for arbitrary dictionary D, one obtains the D-RIP with high probability. This can be interpreted as a dic¬ 
tionary version of the universality property discussed above. It has now been shown that several classes 
of random matrices satisfy this property as well as subsampled structured matrices, after applying ran¬ 
dom column signs IDG0311KW11I . The matrices in both of these cases are motivated by application sce¬ 
narios, but typically in applications they appear without the randomized column signs. In many cases 
one is not able to apply column signs in practice; for example in cases where the measurements are fixed 
such as in MRI, one has no choice but to use unsigned Fourier samples and cannot pre-process the data 
to incorporate column signs. Without these signs however, such measurement ensembles will not work 
for arbitrary dictionaries in general. This is closely related to the underlying RIP matrix constructions not 
being universal. For example, it is clear that randomly subsampled Fourier measurements will fail for the 
oversampled Fourier dictionary (for reasons similar to the basis case). In this work, we address this issue, 
deriving recovery guarantees that take into account the dictionary. Similar to the basis case, our analysis 
will be coherence based. A similar approach has also been taken by Poon in iPool5bl (completed after 
the first version of our paper), for an infinite dimensional version of the problem. 

1.4. Contribution. Our main result shows that a wide class of orthogonal matrices having uniformly 
bounded entries can be subsampled to obtain a matrix that has the D-RIP and hence yields recovery 
guarantees for sparse recovery in the setting of redundant dictionaries. As indicated above, our tech¬ 
nical estimates below will imply such guarantees for various algorithms. As an example we focus on 
the method of £\ -analysis, for which the first D-RIP based guarantees were available ICENRIOl . Our 
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technical estimates will also provide more general guarantees for weighted £\-analysis minimization (a 
weighted version of see \Pi, w ) in Section [3] for details) in case one has prior knowledge of the un¬ 
derlying sparsity support. 

Recall that the method of £\-analysis consists of estimating a signal / from noisy measurements y = 
Af + e by solving the convex minimization problem 

/** = argmin||D*/||i such that \\Af-yWz < e, (Pi) 

/£ C" 

where e is the noise level, that is, ||e|| 2 < e. The ff -analysis method (like alternative approaches) assumes 
that for the signal / = Dx, not only is the underlying (synthesis) coefficient sequence x sparse (typically 
unknown and hard to obtain), but also the analysis coefficients D* f are nearly sparse, i.e., dominated 
by a few large entries. We refer the reader to Theorem l2.2l below for the precise formulation of the result¬ 
ing recovery guarantees (as derived in I CENRlOl ). The assumption has been observed empirically for 
many dictionaries used in practice such as the Gabor frame, undecimated wavelets, curvelets, etc. (see, 
e.g., ICENRlOl for a detailed description of such frames) and is also key for a number of thresholding 
approaches to signal denoising i CD()4lfKL07 . ELL08I . 

A related and slightly stronger signal model, in which the analysis vector D*f is sparse or nearly 
sparse, has been considered independently from coefficient sparsity (e.g., INDEG13I ). and is commonly 
called the co-sparsity model. 

The results in this paper need a similar, but somewhat weaker assumption to hold for all signals cor¬ 
responding to sparse synthesis coefficients x. Namely, one needs to control the localization factor as we 
now introduce. 

Definition 1 . 1 . For a dictionary D e c nxN and a sparsity level s, we define the localization factor as 

def \\D*Dz\\i 

Is.D = TI= sup - — -. (1.1) 

||Dz|| 2 =1,||z||o<5 VS 

In the following we will mainly consider dictionaries, which form Parseval frames, that is, D* is an 
isometry. Then the term localization factor is appropriate because this quantity can be viewed as the fac¬ 
tor by which sparsity is preserved under the gram matrix map D* D, compared to the case where D is an 
orthonormal basis and D* D = I„ and q = 1 by Cauchy-Schwarz. For a general family of such frames pa¬ 
rameterized by the redundancy N In, q will increase with the redundancy; families of dictionaries where 
such growth is relatively slow will be of interest. For example, new results on the construction of unit- 
norm tight frames give a constructive method to generate spectral tetris frames lCFM + ll|, whose gram 
matrices have at most 2\NIn] + 6 non-zeros per row, guaranteeing that q is proportional only to the re¬ 
dundancy factor Nl n. In fact, one can show that these are sparsest frames possible | CHKK11|, suggesting 
that families of tight frames with localization factor depending linearly on the redundancy factor should 
be essentially optimal for £\-analysis reconstruction. We pose as a problem for subsequent work to ob¬ 
tain such estimates for dictionaries of practical interest, discussing a few examples in Section [3l First, 
to illustrate that our results go strictly beyond existing theory, we show that harmonic frames (frames 
constructed by removing the high frequency rows of the DFT - see 13.31 0 with small redundancy indeed 
have a bounded localization factor. To our knowledge, this important case is not covered by existing the¬ 
ories. In addition, we also bound the localization factor for redundant Haar wavelet frames. We expect, 
however, that it will be difficult to efficiently compute the localization factor of an arbitrary dictionary. 
For bounded localization factors, we prove the following theorem. 

Theorem 1 . 2 . Fix a sparsity level s < N. Let D e C' ix "" v be a Parseval frame - i. e., DD* is the identity- 
with columns {d\,dtq], and letB = {b\,...,b n } bean orthonormal basis of C" which is incoherent to D 
in the sense that 

sup sup \(bj,dj)\< Kn~ 112 (1.2) 

ieln]jelN] 

for some constant K > 1. 
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Consider the (unweighted) localization factor rj = r] s j) as in Definition ] 1.1\ Construct B e £ mxn Jjy 
sampling row vectors from B i.i.d. uniformly at random. Provided 

m>CsK 2 ri 2 log 3 (sr] 2 )log{N), (1.3) 

then with probability 1 - N~ u ''Z’ (2s \ \J~^B exhibits uniform recovery guarantees for £\-analysis. That is, 

for every signal f , the solu t ion f : of the minimization problem E) with y = y/%Bf + e for noise e with 
||e || 2 < £ satisfies 


\\f-f\\ 2 <C l e + C 2 


\\D*f— [D* f] 


sill 




(1.4) 


Here, C, C\, and C 2 are absolute constants independent of the dimensions and the signal f. 


Above and throughout, [u) s denotes the best s-sparse approximation to a signal u, that is, (u) s = 
nrgmin|| Z || 0 < s || u - z\\ 2 . 

Remarks. 

1. Our stability result 11.41 1 for analysis guarantees recovery for the particular / = Dz only up to the 
scaled norm of the tail of D*Dz. In fact, the quantity = l|£> f ~^ , referred to as the unrecover¬ 

able energy of the signal / in D CFXRlO j, is closely related to the localization factor tj s : 


\\D*fh 

hs = sup - — 

f=Dz: ||/|| 2 =l,||z|| 0 <s 


< sup e: + i 

f=Dz: ||/|| 2 =l,||z|| 0 <s 


2. As mentioned, the proof of this theorem will proceed via the D-RIP, so our analysis yields similar 
guarantees for other recovery methods as well. 

3. It is interesting to remark on the role of incoherence in this setting. While prior work in compressed 
sensing has required the measurement matrix B itself be incoherent, we now ask instead for incoherence 
between the basis from which measurements are selected and the dictionary D, rather than incoherence 
within D. Of course, note that the coherence of the dictionary D itself impacts the localization factor rj s j>. 
We will see later in Theorem l3.1l that the incoherence requirement between the basis and dictionary can 
be weakened even further by using a weighted approach. 

4. The assumption that D is a Parseval frame is not necessary but made for convenience of presentation. 
Several results have been shown for frames which are not tight, e.g., ILML121lRK13llGirl4l . Indeed, any 
frame D e T" x "" v with linearly independent rows is such that 

D := (DD*)~ 1I2 D 


is a tight frame. As observed in the recent paper I Foul5 i, in this case one can use for sampling the 
adapted matrix A = [B[DD*)~ 112 ), as measuring the signal / = Dz through y = Af + e is equivalent to 
measuring the signal / = Dz through y = Bf + e. Working through this extension poses an interesting 
direction for future work. 


1.5. Organization. The rest of the paper is organized as follows. We introduce some notation and re¬ 
view some technical background in Section [2] before presenting a technical version of our main result 
in Section[3j which demonstrates that bases which are incoherent with a dictionary D can be randomly 
subsampled to obtain D-RIP matrices. In that section, we also discuss some relevant applications and 
the implications of our results, including the regime where the sampling basis is coherent to the dictio¬ 
nary. The proof of our main theorem is presented in the final section. 


2. Notation and technical background 

Throughout the paper, we write C, C', C\, C 2 ,... to denote absolute constants; their values can change 
between different occurrences. We write \ n to denote the n x n identity matrix and we denote by [n\ 
the set {1,2 ,...,n}. For a vector u, we denote the jth index by u[j] or u{j] or Uj, depending on which 
notation is the most clear in any given context; its restriction to only those entries indexed by a subset 
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A is donated by u a. These notations should not be confused with the notation {u) s , which we use to 

def 

denote the best s-sparse approximation to u, that is, (u) s = arginf|| Z || 0 < s ||«-z|| 2 . Similarly for a matrix A, 
Aj is the y'th column and A,\ is the submatrices consisting of the columns with indices in A. 

2.1. The unweighted case with incoherence. Recall that a dictionary D e c nxN is a Parseval frame if 
DD* = I n and that a vector x is s-sparse if ||x||o := I supp(x)| < s. Then the restricted isometry property 
with respect to the dictionary D (D-RIP) is defined as follows. 

Definition 2.1 ( ICENR101 ). Fix a dictionary D e C nxN and matrix AeC mxn . The matrix A satisfies the 
D-RIP with parameters 5 and 5 if 

a-8)\\Dx\\l<\\ADx\\l<a + 8)\\Dx\\j (2.1) 

for all s-sparse vectors x e C N . 

Note that when D is the identity, this definition reduces to the standard definition of the restricted 
isometry property ICT05al . Under such an assumption on the measurement matrix, the following results 
bound the reconstruction error for the £\-analysis method. 

Theorem 2.2 ( ICENR10I ). Let D be a tight frame, e > 0, and consider a matrix A that has the D-RIP 
with parametersZs and5 < 0.08. Then for every signal f e C", the reconstruction p obtained from noisy 
measurements y = Af + e, || e ||2 < £, via the -analysis problem E) above satisfies 


f-fh 2 <e+ 


D*f-{D*f) s ||i 


( 2 . 2 ) 


Thus as indicated in the introduction, one obtains good recovery for signals / whose analysis coeffi¬ 
cients D*f have a suitable decay. 

Both the RfP and the D-RIP are closely related to the Johnson-Lindenstrauss lemma I HV11 AC09 , 
IBDDW081 . Recall the classical variant of the lemma states that for any e e (0,0.5) and points a i,..., x d e 
R", that there exists a Lipschitz function /: R" —* for some m = 0(£ -2 logd) such that 

il-E)\\Xi-Xj\\l < \\f(Xi)-f(Xj)\\l < ll + e)\\Xi-Xj\\l (2.3) 

for all i, j e {1, ..., d}. Recent improvements have been made to this statement which in particular show 
that the map / can be taken as a random linear mapping which satisfies (I2.3I> with high probability (see, 
e.g., |Ach03| for an account of such improvements). Indeed, any matrix A e £ mxn which for a fixecy 
vector zeC" satisfies 

p((l-d)||z||^ < \\Az\\l < a + 8)\\z\\ 2 2 ) < Ce~ cm 

will satisfy the D-RIP with high probability as long as m is at least on the order of slog(n/s) ICENR101 . 
From this, any matrix satisfying the Johnson-Lindenstrauss lemma will also satisfy the D- RfP (see IBDDWOBl 
for the proof of the RIP for such matrices, which directly carries over to the D-RIP). Random matrices 
known to have this property include matrices with independent subgaussian entries (such as Gaussian 
or Bernoulli matrices), see for example IDG03I . Moreover, it is shown in |KW11| that any matrix that 
satisfies the classical RfP will satisfy the Johnson-Lindenstrauss lemma and thus the D-RIP with high 
probability after randomizing the signs of the columns. The latter construction allows for structured 
random matrices with fast multiplication properties such as randomly subsampled Fourier matrices (in 
combination with the results from iRV08l ) and matrices representing subsampled random convolutions 
(in combination with the results from (RRT12i KMR141 ): in both cases, however, again with randomized 
column signs. While this gives an abundance of such matrices, as mentioned above, it is not always 
practical or possible to apply random column signs in the sampling procedure. 


^By “fixed” we mean to emphasize that this probability bound must occur for a single vector z rather than for all vectors z 
like one typically sees in the restricted isometry property. 
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An important general set-up for structured random sensing matrices known to satisfy the regular RIP 
is the framework of bounded orthonormal systems, which includes as a special case the subsampled dis¬ 
crete Fourier transform measurements (without column signs randomized). Such measurements are the 
only natural measurements possible in many physical systems where compressive sensing is of interest, 
such as in MRI, radar, and astronomy I LDP07[lBS071lHS09 , BSQ081 , as well as in applications to polyno¬ 
mial interpolation RW12 BDWZ12, RW11| and uncertainty quantification 1HD15I . In the following, we 
recall this set-up (in the discrete setting), see IRaulOl Sec.4] for a detailed account of bounded orthonor¬ 
mal systems and examples. 

Definition 2.3 (Bounded orthonormal system). Consider a probability measure v on the discrete set [n\ 
and a system {r ; e <C'\ j e [n]} that is orthonormal with respect to v in the sense that 

Y J r k(i)rj(i)v i = 5 j , k 


where 8; k = \ ^ is the Kronecker delta function. Suppose further that the system is uniformly 

[0 else. 

bounded: there exists a constant K > 1 such that 

sup sup \ rj(i)\ < K. (2.4) 

iE[n] jE[n\ 

Then the matrix AeC nxn whose rows are r ; is called a bounded orthonormal system matrix. 

Drawing m indices i\, h,---, i m independently from the orthogonalization measure v, the sampling 
matrix A e C mx " whose rows are indexed by the (re-normalized) sampled vectors \J^rj(i k ) e C" will 
have the restricted isometry property with high probability (precisely, with probability exceeding 1 - 
fi-ciog’M) provided the number of measurements satisfies 

m> CK 2 slog 2 (s)login). (2.5) 

This result was first shown in the case where v is the uniform measure by Rudelson and Vershynin IRV08 1, 
for a slightly worse dependence of m on the order of slog 2 slog(slogn)log(n). These results were sub¬ 
sequently extended to the general bounded orthonormal system set-up by Rauhut I RaulOl . and the de¬ 
pendence of m was slightly improved to slog 3 s log n in IRW13I . 

An important special case where these results can be applied is that of incoherence between the mea¬ 
surement and sampling bases. Here the coherence between two sets of vectors <t> = {(/>,-} and v i' = \y/j} is 
given by p = sup,-j \((pi,iffj)\. Two orthonormal bases ® and 'P of C" are called incoherent if p < Kn~ 112 . 
In this case, the renormalized system O = { v / H0 i 'P*} is an orthonormal system with respect to the uni¬ 
form measure, which is bounded by K. Then the above results imply that signals which are sparse in 
basis V F can be reconstructed from inner products with a uniformly subsampled subset of basis O. These 
incoherence-based guarantees are a standard criterion to ensure signal recovery, such results had first 
been observed in ICR07I . 

2.2. Generalization to a weighted setup and to local coherence. Recently, the criterion of incoherence 
has been generalized to the case where only most of the inner products between sparsity basis vectors 
and measurement basis vectors are bounded. If some vectors in the measurement basis yield larger 
inner products, one can adjust the sampling measure and work with a preconditioned measurement 
matrix I KW14j . and if some vectors in the sparsity basis yield larger inner products, one can incorporate 
weights into the recovery schemes and work with a weighted sparsity model IRW13I . Our results can 
accommodate both these modifications. In the remainder of this section, we will recall some background 
on these frameworks from [KW14l ; RW131 and formulate our definitions for the more general setup. 
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Consider a set of positive weights a> = (Wj) je[iv] • Associate to these weights the norms 


II co,p 



0 < p < 2, 


( 2 . 6 ) 


along with the weighted “Criiorm", or weighted sparsity: ||x|| w ,c := ’Lj:\xj\>o a,2 j> which equivalently is 
dehned as the limit ||x|| Wi o = lim^o llxll^.p- We say that a vector x is weighted s-sparse if ||x|| w ,o := 
'Lj:\xj\>() oj2 j — s - hi line with this dehnition, the weighted size of a finite set A c N, is given by w(A) := 
T.jeA° j2 j' thus a vector is weighted s-sparse if its support has weighted size at most 5. When cjj > 1 for 
all j, the weighted sparsity of a vector is at least as large as its unweighted sparsity, so that the class 
of weighted s-sparse vectors is a strict subset of the s-sparse vectors. We make this assumption in the 

remainder. Note in particular the special cases || jc || w>1 = J_j and IIxIL ,2 = IIJCII 2 = \JlLj |x/l 2 ; by 

Cauchy-Schwarz, it follows that ||x|| w ,i < if x is weighted s-sparse. Indeed, we can extend the 

notions of localization factor (T7D and D-RIP I l2.lt to the weighted sparsity setting. It should be clear 
from context which definition we refer to in the remainder. 


Definition 2.4. For a dictionary D e C nxN , weights a) 1 ,( 02 ,..., > 1, and a sparsity level s, we define the 

(weighted) localization factor as 


def 

t]co,s,D =V= SU P 

||Dz|| 2 =1,||z|L,o<s 


D*Dz L,i 

Vs 


(2.7) 


Definition 2.5. Fix a dictionary D e C nxN , weights ^ 1. and matrix A e £ mx, \ The matrix 

A satisfies the D-atRIP with parameters 5 and 5 if 

a - s)\\dx\\ 2 2 < \\adx\\ 2 2 < a + s)\\dx\\ 2 2 (2.8) 

for all weighted s-sparse vectors x e C ;V . 

When D is an orthonormal matrix, this definition reduces to the definition taRIP from IRW13I . More 
generally, weights allow the flexibility to incorporate prior information about the support set and allow 
for weaker assumptions on the dictionary D. In particular, a larger weight assigned to a dictionary el¬ 
ement will allow for this element to have larger inner products with the measurement basis vectors. In 
this regard, a basis version of our result is the following variant of a theorem from IRW13II : 

Proposition 2.6 (from I RW131 ). Fix a probability measure v on \n\, sparsity level s < n, and constants < 
8 < 1. LetD e C' ix,i bean orthonormal matrix. Let A bean orthonormal system with respect to v as inDef- 
inition \2li\ with rows denoted by r,-, and consider weights o>] } o> 2 > n - 1 such that max/ |(r ! -,d;}| < o> j. 
Construct an mxn submatrix A of A by sampling rows of A according to the measurev and renormalizing 
each row by \ / - 7 -. Provided 

^ y rfl 

m>C8~ 2 slog 3 (s)log{n), (2.9) 

then with probability 1 - n clog3 s , the submatrix A satisfies the oj RIP with parameters s and 8. 


An adjusted sampling density for the case when certain vectors in the measurement basis yield larger 
inner products is obtained via the local coherence as defined in the following. 

Definition 2.7 (Local coherence, IKW141IRW12I ). The local coherence of a set <1> = {<pdf =1 c C" with re¬ 
spect to another set V P = {y/j}j =1 Q C n is the function p oc (<5, T') e U k defined coordinate-wise by 


p\ oc {®,V) = p\° c = sup \(<p it y/j) I, 
!<;<f 


i = 1 , 2 ,..., k. 
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When <J> and T 7 are orthonormal bases, and a subset of <1> is used for sampling while T 7 is the basis in 
which sparsity is assumed, the main point is that renormalizing the different vectors <p,in the sampling 
basis by respective factors does not affect orthogonality of the rows of <5* V P. In this way, the larger in- 

ner products can be reduced in size. To compensate for this renormalization and retain the orthonormal 
system property in the sense of Definition l2.31 one can then adjust the sampling measure. The resulting 
preconditioned measurement system with variable sampling density will then yield better bounds K in 
Equation 12.51 . This yields the following estimate for the restricted isometry property. 


Proposition 2.8 ( 1KW141 [RW121 1. Let = {cpjVj =1 and T 7 = h LkVl =l be orthonormal bases of C”. As¬ 
sume the local coherence o/<E> with respect to is pointwise bounded by the function k/ -fn e C", that 
is sup \{(pj,y/] C )\<Kj. Suppose 

l<k<n 


m>C8 2 \\k |||slog 3 (.s)log(n), 


( 2 . 10 ) 


and choose m (possibly not distinct) indices j e \n\ i.i.d. from the probability measure v on \n \ given by 


v(j) = 


K ) 


II2 ‘ 


Call this collection of selected indices D. (which may possibly be a multiset). Consider the matrix A e C r 
with entries 


Aj,k = (<Pj,V'k), jeO,fce[n], (2.11) 

and consider the diagonal matrix W = diag(ie) e C" x " with wj = \\KW 2 lKj. Then with probability at least 
1 - n ~ clog3 ^, the preconditioned matrix -f=WA has the restricted isometiy property with parameters 8 
CLYld S. 


Remark. In case and V F are incoherent, or if kj = Kn 1/2 uniformly for all j, then local coherence 
sampling as above reduces to the previous results for incoherent systems: in this case, the associated 

probability measure v is uniform, and the preconditioned matrix reduces to -j=WA = 

Our main result on D-RIP for redundant systems and structured measurements (Theorem 13.1! im¬ 
plies a strategy for extending local coherence sampling theory to dictionaries, of which Theorem [L2] is 
a special case. Indeed, if 'P = {y/j}, more generally, is a Parseval frame and = {<[),} is an orthonormal 

matrix, then renormalizing the different vectors cpt in the sampling basis by respective factors still 

A 1 ; 

does not affect orthogonality of the rows of O* T 7 . We have the following corollary of our main result; we 
will establish a more general result, Theorem [3TT] in the next section, from which both Theorem 1 1.2 1 and 
Corollarv l2.9l follow. 


Corollary 2.9. Fix a sparsity level s < N, and constants < 8 < 1. Let D e C ,ixiV be a Parseval frame with 
columns {di,...,d?j}, let B with rows {b\,...,b n } be an orthonormal basis ofC n , and assume the local 

coherenceofD with respect to B is pointwise bounded by the function keC' 1 , thatis sup \{dj,bf) \ < k^. 

l<j<N 

K 2 

Consider the probability measure v on [n\ given by v(k) = along with the diagonal matrix W = 

diag(ii;) e £ nxn with w^ = ||k|1 2 /^A:- Note that the renormalized system <5 = - 7 = WB is an orthonormal 

system with respect to v , bounded by \\k\\ 2 . Construct B e C mxn by sampling vectors from B i.i.d. from the 
measure v, and consider the localization factor p = t)d ]S ofthep'ameD as in Definitior Al.H 
As long as 

m > C 5 _ 2 77 2 ||K|||slog 3 (sry 2 )log(Ai) 

then with probability \ - N~ lo £ (2s \ the following holds for every signal f: the solution f$ of the weighted 
(\ -analysis problem 

/*• = argmin ||D*/||i such that || —— W[Bf- j /)||2 ^ £, 
feC" V™ 


iPl.w) 
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with y = Bf + e for noise e with weighted error || —= We ||2 < £ satisfies 

IIO*/-(D*/) S lli 

||/ 1, -/||2<Ci£+C2- - -. (2.12) 

Vs 

Here C, Ci, and C 2 are absolute constants independent of the dimensions and the signal f. 

The resulting bound involves a weighted noise model, which can, in the worst case, introduce an 
additional factor of y^max* v/v(z). We believe however that this noise model is just an artifact of the 
proof technique, and that the stability results in Corollary EH should hold for the standard noise model 
using the standard £\-analysis problem (Jfij . In the important case where D is a wavelet frame and B is 
the orthonormal discrete Fourier matrix, we believe that total variation minimization, like f\ -analysis, 
should give stable and robust error guarantees with the standard measurement noise model. Indeed, 
such results were recently obtained for variable density sampling in the case of orthonormal wavelet 
sparsity basis IPool5al , improving on previous bounds for total variation minimization IKW14[|NW13bl 
lNW13al 1AHPR131 . Generalizing such results to the dictionary setting is indeed an interesting direction 
for future research. 


3. Our main result and its applications 

As mentioned in the previous section, the case of incoherence between the sampling basis and the 
sparsity dictionary, as in Theorem ll.2l is a special case of a bounded orthonormal system. The following 
more technical formulation of our main result in the framework of such systems covers both weighted 
sparsity models and local coherence and as we will see, implies Theorem ll.2l The proof of Theorem l3.ll 
is presented in SectionlH 


Theorem 3.1. Fix a probability measure v on [IV], sparsity level s < N, and constant 0 < 8 < 1. Let D e 
C nxN be a Parseval frame. Let A be an orthonormal systems matrix with respect to v with rows r ( ; as in 
Definition ^. 3\ and consider weights(Oi,a> 2 ,..., 00 n > 1 such that 

max\(rj,dj)\ <to,-. (3.1) 

i 1 J 

Define the unrecoverable energy p = Rd.s as in Definition 12.71 Construct an mx n submatrix A of A by 
sampling rows of A according to the measure v. Then as long as 

m > Cd _2 sr/ 2 log 3 (sry 2 )log(Ai), and 

m > C5~ 2 sp 2 \o%lH y) (3.2) 


then with probability 1 - y, the normalized submatrix 
5. 



satisfies the D-to RIP with parameters s and 


Proof of Theorem l 1.2l assuming Theorem l3.lt Let B, D and s be given as in Theorem ll.2l We will apply 
Theorem 13.11 with toj = 1 for all j, with v the uniform measure on \n], sparsity level 2s, y = AT log3(2 ^, 
5 = 0.08, and matrix A = \fhB- We first note that since B is an orthonormal basis, that \J~hB is an or¬ 
thonormal systems matrix with respect to the uniform measure v as in Definition 12.31 In addition, 11.21 
implies 13.11 1 for matrix A - \fnB with K = to j - 1. Furthermore, setting y = W“ lo ^ (2s) , 11.31 1 implies both 
inequalities of 13.21 (adjusting constants appropriately). Thus the assumptions of Theorem 13.II are in 
force. Theorem l3.1l then guarantees that with probability 1-y = l-AT log3(2i) , the uniformly subsampled 
matrix \J^{\fnB) satisfies the D-otRIP with parameters 2s and 5 and weights (Oj = 1. A simple calcula¬ 
tion and the definition of the D-aiRIP (Definition s. 51 shows that this implies the D-RIP with parameters 
2s and 8. By Theorem s. 21 11.411 holds and this completes the proof. □ 
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Proof of Corollarv l2.9l assuming Theorem l3.ll Corollarv l2.9l is implied by Theorem l3. 1 I because the pre¬ 
conditioned matrix O = -j=WB formed by normalizing the /'Lh row of B by |?c constitutes an or¬ 
thonormal system with respect to the probability measure v(z') and uniform weights toj = |j jcH 2 . Then 
the preconditioned sampled matrix <f> satisfies the D-RIP for sparsity 2s and parameter 5 according to 
Theorem l3.1l with probability 1-y = 1 -lV _log3(2 ^. Applying Theorem l2.2l to the preconditioned sampling 
matrix <f> and Parseval frame D produces the results in Corollarv l2.91 □ 


3.1. Example: Harmonic frame with L more vectors than dimensions. It remains to find examples of a 
dictionary with bounded localization factor and an associated measurement system for which incoher¬ 
ence condition < 1.21 holds. Our main example is that of sampling a signal that is sparse in an oversampled 
Fourier system, a so-called harmonic frame I VW04) : the measurement system is the standard basis. In¬ 
deed, one can see by direct calculation that the standard basis is incoherent in the sense of 11.21 1 to any 
set of Fourier vectors, even of non-integer frequencies. We will now show that if the number of frame 
vectors exceeds the dimension only by a constant, such a dictionary will also have bounded localization 
factor. This setup is a simple example, but our results apply, and it is not covered by previous theory. 

More precisely, we fix L £ N and consider N = n + L vectors in dimension n. We assume that L is such 
that Ls < -j. Then the harmonic frame is defined via its frame matrix D = (dj k ), which results from the 
N x N discrete Fourier transform matrix F = ( f ir) by deleting the last L rows. That is, we have 

d Jl‘ = 7kL eXp[ ^Tr ) (3 ‘ 3) 

for j = l...n and k = 1... N. The corresponding Gram matrix satisfies ( D * D) a = -ffj and, for j ^ k, by 
orthogonality of F, 


\(D*D) jk \ 


N 


(F* F) jk — £ f tj f tk 


(= n+1 


L 

71 + L 


As a consequence, we have for z s-sparse and j C suppz 


\(D* Dz)[j]\ = 


Y (i D*D) jk z[k] 

fcesuppz 


11 +L 


\Z\\i 


where we write z[k] to denote the kth index of the vector z. Similarly, for j e suppz, we obtain 


(D*Dz)[j] -z[j] \ = 


Y (D*D) jk z[k) 


fcesuppz, fcy j 


n+L 


\z\h- 


So we obtain, using that D* is an isometry, 

\\Dz\\ 2 2 = \\D*Dz\\ 2 2 > Y [(D*Dz)[k]f> Y (\z[k]\-\(D*Dz)[k]-z[k]\) 2 

fcesuppz fcesuppz 

> Y \z[k]\ 2 -2-^ z \\z\\i\z[k]\ 

fcesuppz 

= \\zf 2 -^WzWl > (1- ^lllzll! > illzlli 
That is, for z with \\DzW 2 = 1, one has ||z ||2 < \[2. Consequently, 

\D r Dz\h II (F) Dz)|( suppz )||i + ||(£> Dz)|( suppz )c|h 


r / = sup 


sup 


||Dz|| 2 =1,||z|| 0 <s ||Dz|| 2 =1,||z|| 0 <s 

D Dz)|( Sup p Z )c||i 


< sup || D Dz || 2 + 
||Dz|| 2 =1,||z|| 0 <5 




= 1+ sup E \(D* Dz)[j]\ 

l|fc>z|| 2 =l,||z|| 0 <.s /e(suppz) c 


<1+ SUp < 1+ (JV S) ^ Zh < l + L\/2. 


||Dz|| 2 =l,||z|| 0 <i 


□ 
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Remark. We note here that the dependence on L seems quite pessimistic. Indeed, one often instead 
considers the redundancy of the dictionary, namely r = Nln. In this case, even for redundancy r = 2 
we would have L = n and require a full set of measurements according to this argument. However, we 
conjecture that rather than a dependence on L one can reduce the dependence to one on the redundancy 
r. Intuitively, due to the restriction of Parseval normalization on the frame D, there is a tradeoff between 
redundancy and coherence; an increase in redundancy should be in some way balanced by a decrease 
in coherence. This conjecture is supported by numerical experiments and by theoretical lower bounds 
for robust recovery in the co-sparse model IGPV15I , but we leave a further investigation to future work. 

This example also plays an important role in the so-called off the grid compressed sensing setting 
ITBSR12IISB12I . In this framework, the signal frequencies are not assumed to lie on a lattice but instead 
can assume any values in a continuous interval. When the signal parameters do not lie on a lattice, 
the signal may not be truly sparse in the discrete dictionary, and refining the grid to incorporate finer 
frequencies may lead to numerical instability. In addition, typical compressed results are difficult to 
apply under discretization, making off the grid approaches advantageous. The example we discuss above 
gives a possible compromise to nearly on the grid recovery from linear measurements. 

Another line of research that relates to this example is that of superresolution (see, e.g., ICFG141 
ICFG13I and many followup works). These works study the recovery of frequency sparse signals from 
equispaced samples. No assumptions regarding an underlying grid are made, but rather one assumes a 
separation distance between the active frequencies. In this sense, the nearly-on-the-grid example just 
discussed satisfies their assumption as the corresponding separation distance is close to the grid spac¬ 
ing. However, the nature of the resulting guarantees is somewhat different. For example, in these works, 
every deviation from an exactly sparse signal must be treated as noise, whereas in our result above we 
have an additional term capturing the compressibility of a signal. We think that an in-depth comparison 
of the two approaches is an interesting topic for future work. 


3.2. Example: Fourier measurements and Haar frames of redundancy 2. In this subsection, we present 
a second example of a sampling setup that satisfies the assumptions of incoherence and localization 
factor of Theorem 13.11 In contrast to the previous example, one needs to precondition and adjust the 
sampling density to satisfy these assumptions according to Gorollarv l2.9l which allows us to understand 
the setup of the Fourier measurement basis and a ID Haar wavelet frame with redundancy 2, as intro¬ 
duced in the following. Let n = 2 P . Recall that the univariate discrete Haar wavelet basis of C 2P consists 

of h° = 2~ pl2 (1,1,..., 1 ),h= h 0 o = 2~ pl2 {l, 1,_, 1, —1, —1,_,—1) and the frame basis elements h(j c given 

component wise by 


h{,kij] = h[2 e j-k] 


2 for k2 p ~ e <j< k2 p ~ e + 2 p ~ e ~ l 

-2TT for k2 p ~ c + 2 p ~ e ~ l <j< k2 p ~ e + 2 p ~ e 

0 else, 


for any {£, k ) e Z 2 satisfying 0 < £ < p and 0 < k < 2 . The corresponding basis transformation matrix is 
denoted by H. 

One can now create a wavelet frame of redundancy 2 by considering the union of this basis and a 
circular shift of it by one index. That is, one adds the vector h° - h° = 2~ pl2 (1,1,..., 1) and vectors of the 
form h.£ t ic[j] = he t k[j + 1] for all {£, k). Here we identify 2 P + 1 = 1. This is also an orthonormal basis - 
its basis transformation matrix will be denoted by H in the following, and the matrix D e £ 2Px2P+1 with 
columns 


D{:,{£,2k-1)) 

Dfr{£,2k)) 


1 

n 

1 

n 


h(,k, 

he,k 


(3.4) 


forms a Parseval frame with redundancy 2. 
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Corollarv l2.9l applies to the example where sparsity is with respect to the redundant Haar frame and 
where sampling measurements are rows {r k } from the n x n orthonormal DFT matrix. Indeed, following 
Corollary 6.4 of IKW14I , we have the following coherence estimates: 


max{|(rjfc,^j)U 


[rk’he,j)\}<Kk :=3V2nlVlc. 


Since \\k ||| = 187r£" =1 k~ l < 18/rlog 2 (/z) grows only mildly with n, the Fourier / wavelet frame example 
is a good fit for Corollary 12.91 provided the localization factor of the Haar frame is also small. We will 
show that the localization factor of the Haar frame is bounded by r)< yj2\og 2 {n), leading to the following 
corollary. 


Corollary 3.2. Fix a sparsity level s < N , and constants < 8 < 1. LetD e C nxN (TV = 2n) be the redundant 
Haar frame as defined in 13.41 1 and let A e C” x ” with rows { ri,...,r n } be the orthonormal DFT matrix. 
Consider the diagonal matrix W = diag(ie) e £ nxn with w k = C'(log 2 (n)) _1/2 k 1/2 , and construct A e £ mxn 
by sampling rows from A i.i.d. from the probability measure v on [n\ with power- la w decay v ( k) = Ck~ l . 

As long as the number of measurements satisfies m > Cid _2 slog 3 (slog(n)) log 3 (n), then with probability 
1 - n ~ logi s , the following holds for every signal f : the solution f : of the weighted £\ -analysis problem 
with y = Af + e for noise e with weighted error || -j= We \\2 < £ satisfies 


Pi,v 


\\f-f\\ 2 <C 2 e + C 3 


I D*f-(D*f) 


sill 




Here C,Ci,C 2 , and C :i are absolute constants independent of the dimensions and the signal f. 


Proof To derive this result from Corollarv l2.91 it suffices to prove that the redundant Haar frame as de¬ 
fined above has localization factor at most r\ < i/2log 2 (n). We will show that for each s-sparse z e R iV , 
D* Dz is at most 3slog 2 n-sparse. Here, D*D is the N x TV Gramian matrix. For that, consider first either 
of the two (equal) frame elements and -J=/z°. Each of these frame elements has non-zero inner 

product with exactly two frame elements: -^=h° and -U/r 0 . So the corresponding columns of D D have 
only two non-vanishing entries. 

Consider then a non-constant frame element \h( t ^. Because H is an orthonormal basis, it is or¬ 
thogonal to all he> tk i with {£', k') ^ (£,k), which is why (D*D)( fc;2 fl,(fc', 2 « = [D*D) {k> 2 e-i),(k',2t'-i) = 0 for 
, k') {£, k). So it remains to consider correlations with the h/ : \k’- Again by orthogonality, one has for 

l£',k')?i£,k) 


(h(', k ', hp k ) = 8[f' )k ') t [f tk ) + {hfi )k >, h( tk - h( }k ), 


where 5ia,b),(c,d.) 


if ( a , b ) = (c, d) 
else. 


denotes the Kronecker delta. By definition, hf ik - ipy has only 


three non-zero entries. On the other hand, for each level £', the supports of the hft are disjoint for 
different values of k!. So for each £', the supports of at most 3 of the hr\k ! can intersect the support 
of hp k ~ hc.k- As for £' = £, one of these hc tk ’ must be h( tk , we conclude that for each £' at most 3 of 
the \{hft >k t, h( tk ) | can be nonzero. As there are log 2 n levels £', this contributes at most 3log 2 n nonzero 
entries in the column of D*D indexed by (fc, 2£). Together with [D*D)( k ,2e)Xk,2t) = 1 and noting that a 
similar analysis holds for columns indexed by (k, 2£-l), we obtain that each column of D D has at most 
3log 2 (n) + 1 non-zero entries. Now for each s-sparse z, D* Dz is a linear combination of the s columns 
corresponding to suppz. Consequently, \\D*Dz- (D*Dz) s ||o < 3slog 2 (n) and thus, by Cauchy-Schwarz 
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and noting that for a Parseval frame D, DD* is the identity, 

||D*Dz||i || (£)* Dz) $ || i + || D* Dz — (D* Dz) s || i 


T] = SUp 


sup 


||Dz|| 2 =1,||z||o<5 v/5 ||D*Dz|| 2 =1,||z|| 0 <s 

•\/3slog 2 n\\D* Dz-(D* Dz ) s \\2 


\/s 


< 1 + sup 

||D*Dz|| 2 =1,||z|| 0 <s 




<1+ sup \/3log 2 n||D*Dz ||2 = 1 + \/3log 2 n. 
\\D*Dz\\ 2 =1 


□ 

Remark. Note that this proof is closely related to the observation that each signal that is s-sparse in the 
Haar frame of redundancy 2 is also 0(slog n)-sparse in the Haar wavelet basis. So a number of measure¬ 
ments comparable to the one derived here also allows for recovery of the wavelet basis coefficients. In 
addition, however, a more refined analysis suggests that the entries of D* Dz decay quickly - provided 
not too many approximate cancellations happen. We conjecture that the number of such approximate 
cancellations can be controlled using a more sophisticated analysis, but we leave this to follow-up work. 
We hence conjecture that the logarithmic factor can be removed, making the required number of mea¬ 
surements smaller than for a synthesis approach. 


4. Proof of main result 

Our proof of Theorem 13.11 extends the analysis in I RW13 1, which extends the analysis in ICGV13I to 
weighted sparsity and improves on the analysis in IRV08I . from orthonormal systems to redundant dic¬ 
tionaries. 

Proof of Theorem Um Consider the set 


9 = { b : b £ C”, u = Dz, ||z|| w> o < s, ||u ||2 = 11- 

Let A be the m x n subsampled matrix of A as in the theorem. Then observe that the smallest value 8 = 8 S 
which satisfies the D-aiRIP bound 12.81 1 for A is precisely 

5 S = sup u* 04* A - l n )u. 

Since A*A-I„ is a self-adjoint operator, we may instead define for any self-adjoint matrix B the operator 

III Bills = f sup<Bw, u). 

ME® 


and equivalently write 


8 S = A'A-I, 


Our goal is thus to bound this quantity. To that end, let jq , rz, ■ ■ ■, r m e C” denote the m randomly selected 
rows of A that make up A. It follows from Definition ^. 3l lhal E r* r, = l n for each i, where the expectation 
is taken with respect to the sampling measure v. We thus have 


d S = III A * A I »; 


— L r * r i-I T 


m 


;=i 


1 

m 


JL ( r ;* r i - Er* r/) 
i =1 


(4.1) 
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We may bound the moments of this quantity using a symmetrization argument (see, e.g., Lemma 6.7 
of I RaulOl ): 



m 



m 


E 

Y. Er*r,) 

<2 

E 

L e i r * r i 



i=l 

5 


i=i 



m 

= 2E r E e sup|<y> ; - r * r t x, x) \ 

xe< 3> i =l 


= 2E r E e sup 

*£® 


Y^ei\{ri,x) | 2 


i=i 


(4.2) 


where e, are independent symmetric Bernoulli (±1) random variables. 

Conditional on (/*/), we have a subGaussian process indexed by S>. For a set T, a metric d, and a given 
t > 0, the covering number jV(T,d, t ) is defined as the smallest number of balls of radius t centered at 
points of T necessary to cover T with respect to d. For fixed r n we work with the (pseudo-)metric 


d{x,z) 


' m 

L(l<rj,x> I 2 


0=1 


Then Dudley’s inequality I Dud67ll implies 


\{r h z) | 2 ) 2 


1/2 


(4.3) 


m poo j - 

-supK^e/r* rix,x)\ <4\/2 / JlogMG@,d, t)) dt. 
xe@> i= i do 


(4.4) 


Continuing as in IRW13I , inspired by the approach of ICGV13I , we estimate the metric d using Fiolder’s 
inequality with exponents p > 1 and q > 1 satisfying 1/ p + 1/ q = ltobe specified later. Using also the 
reverse triangle inequality, we have for r,ze@, 


d (jc, z) < 2 sup 

ME® 


/ m 1 

l/(2p) 

£l<r/,ii>| 2 P 
V«=1 J 



jT|<r / ,x-z>| 2 '7 


l/(2(?) 


(4.5) 


We will optimize over the values of p, q later on. To further bound this quantity, we have | (r,-, dj)\ < o>j 
by assumption. Recall the localization factor in Definition l2.4l Then, fixing me® such that u = Dz with 
|| z|| w ,o ^ •?, we have for any realization of (r,), 


I <G, w>l = \{DD*ri, u) | 

= |(DVi,D*Dz)| 

<Ll<G,d ; -)| ; -|(D*Dz) ; -| 

i 

<Zo)j\iD*Dz)j\ 

i 

= ||D*Dz|| Wil 

< v/^ry. (4.6) 


The first line uses that D is a Parseval frame, while the last line uses the definition of q. The quantity sq 2 
will serve as a rescaled sparsity parameter throughout the remaining proof. 

Continuing from ( 14.51) . we may bound 


sup 

me® 


/ m ' 

1/(2 p) 


r m ' 

E l(G,M>| 2p 


= sup 

LKu,M>l 2 l(r/,M>| 2p “ 2 

V*=i J 


ME® 

,/=i 


1/(2 p) 


sup^ |<G,M>I 2 

me® i =l 


1/(2 p) 


< {5r? 2 ) (p-l)/(2p) 


(4.7) 
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We now introduce the (semi-)norm 


I Mil X,q '■ = 


£l<n,M>| 2£ 7 

\i= 1 


1/(2 q) 


Using basic properties of covering numbers and the bound in 020 , we obtain 

m ! m \U(2p) 


Eg sup |<£ e,-r* rjU,u)\< Ci(sry 2 ) (p 1)/(2p) I sup u>r 

HE@ (=1 \M£@;= 1 


f yJ\og{jV{9), IMI X,q,t))dt (4.8) 

J 0 


where Ci is an absolute constant. 

We now estimate the key integral above in two different ways, the brst bound being better for small 
values of t, the second bound better for larger values of t. 

Small values of t :: Following (14.61) and 14.711 and since |j m|| 2 = 1 for all u e 2>, we have that for any 
neg, 

\\u\\ x ,ci<(sri 2 ) ll2 m in2q) . (4.9) 

Standard covering arguments show that for any seminorm ||-||, one has for the unit ball 5||.|| c C' s ’ 
that IMI, t) < (1 + II t) 2s . Thus for Ac{l,...,JV} with |A| = s', one can define the seminorm 

||-|| a on C* by 

m \ l, W 

L\( D >;,y)\ 2q 


. def 

Ia = 


V«=i 


and observe that for u with ||ir||x,^ < 1 and u = D^y for y e C s that one has the equivalence 
|| y || A = II m|| x ,q- Applying this equivalence for A = supp(z) and y = Za, we have that 


Jf&,\\-\\x, q ,t)< Y. B||.|| A ,m 1 /( 2 ^|Ml 2 VW)U) 

w(A)<s 

(1 + yj[ST] 2 )m ll{2q) r 1 ) 25 




\ s ) 


< (eN/s) s ( 1 + yj(sr] 2 )m ll[2cl) f 1 ) 25 , 


where we have applied IFR131 Proposition C.3] in the final line. 

Large values of t :: To get a bound for larger values of t, we will utilize another embedding, and for 
that reason we define the auxiliary set 

®aux= f {M:||D*M|| a) ,i<l,M = DzEC",||z|| &J ,o< 5 }= |J , 

A:(i>(A)<s 


where S&f = f { u : ||D*u|| Wi i < 1, u = Dz e C”,supp(z) = A}. Then we have the embedding £> c 
\/ (.?r/ 2 )S> aux since for any «e0, ||D* a|| W) i < \/ ( sr / 2 ) by definition of the localization factor p. 

We now use a variant of Maurey’s lemma ICar85i , precisely, the variant as stated in Lemma 5.3 
in I IRW131 , in order to deduce a different covering number bound: 


Lemma 4.1 (Maurey’s lemma). For a normed space X, consider a finite setU c X of cardinality 
N, and assume that for every L e N and {ui,...,uf) e U L , E c || T. I j =] £jUj\\x ^ A\fL, where e denotes 
a Rademacher vector. Then for every t>0, 

log.W(conv(lO, || • || x> t ) < c(A/f) 2 log(Ai), 


where conv{U] is the convex hull ofJJ. 
As a corollary, we have the following. 
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Corollary 4.2. For every t> 0, 


logh/K (f2>, || • \\x,q, t) <C 2e ll2 s /2qm ll{2q) \l2{srj z )t 1 log(lV) 


Proof of Corollar[ \4.2\ Consider Lemma l4~T1 with norm || • || x,q, parameter A = 2e 1/2 y^2qm 1/(2(?) , 
and JJ = {±a>-j 1 D*[ej),±ia>'J 1 D*{ej),j e [N]}. First observe that 


(577 2 )@aux <= \/ 2(ST] 2 )COnv{U). 

For a Bernoulli random vector e = (fi,..., ei) and U\,...,ulE U we have 

L I L \l/(2<7) 

EeWjTejUjWx.q^ k\\L e i u j\\x, q 

V 7=1 


7=1 


( m L 

= E E l< r f>E e 7 M 7>| 2<? 

l <"=1 7=1 

(m L 

= Y, E \( r ?’H £ j u j)\ 2q 

U=1 7=1 


<2e~ m ^ 


1 m 


\L ii 2<7 


E H«^.«7»?'=ll'2 
U=i 


1/(2 g ) 


(4.10) 


where we applied Khintchine’s inequality lKhi23l in the last step. Since each Uj consists of a 
multiple of a single column of D, we also have for each j and i that | (r,, Uj) \ = \ (d* r*, oF ] ej'j \ 
for some coordinate vector ej consisting of all zeros with a 1 in the y'th position. Thus | (#■/, uj) | < 
1 for each j and i, which means that || ((/•;, «/))^ =1 II 2 ^ Vl for any L and 


E c || £j Uj || x,q ^ 2e~ 112 \/2qm in2q) \/L = A\^L. 

7=1 


□ 


The corollary then follows from Lemma FTTl 
We have thus obtained the two bounds for t > 0: 

Y^logl^VO, ||-|lx» 1)) ^ ]jslog(eNls) + 2slog \ + 2\J ( st] 2 )m 11 f _1 

yJ\og{Jf{S>, Mix, ?)) ^ C^m ll(2q) \J {sTf)t~ l \J\og{N). 

We may now bound the integral in 14.41 . Without loss, we take the upper integration bound as to = 
\/{srf)rn ]lr2qi because, for t > to, we have ,ff{Fd, || • \\x,q, f) = 1 by 14.91 . Splitting the integral in two parts 
and using our covering number bounds, we have for a e (0, to), 

f J\o%{JV{Q),\\-\\x,q,t))dt 
J 0 

< J slog(eNls) + 2slog l + 2\J (s77 2 )m 1/(2 4) t~ l ^dt + Cis/qm 11 ^ 2 ^ \J ( stf ) J t~ 1 dt 

< ayjs\og{eN!s) + \f2sa\jlog{e{l + yj ( srf)m ll ^q))a - 1 + C 2 \jqm 11 ^ {sT] 2 )\n{AN)\og{^j[srf)m ll{ ' 2q) / a) 

































COMPRESSIVE SENSING WITH REDUNDANT DICTIONARIES AND STRUCTURED MEASUREMENTS 


17 


where in the last step we have utilized that for a > 0, / Q a \/ln(l + f -1 ) d t < ay/ ln(e(l + a -1 )) (see Lemma 
10.3 of IRaulOl ). Choosing a = m 1/(2<?) yields 

f Jlog(JT(3>, II • II x.q, t))dt<c 3 q{sq 2 )m 1,c llog(N)log 2 (sq 2 ). 

J 0 

Combining this with 14. 111 . Il4.2t and 1-1. 8 1 , we have 


Ed,< 


C?,{sr] 2 /P h/(2p) y/ qm 11 1 {sq 2 )\og{N)\ogf [sq 2 ] 


m 


Esup ^ \{r it x )| 2 

xe® \/=l 


l/(2p) 


C 3 (sry 2 ) 1/2+tp 1 )/( 2 p )y / ^log(iv)log 2 Csr7 2 ) 

m l-l/(2g) m -l/(2p) 

C 3 (sry 2 ) 1/2+(p_1)/(2p) ^/ <7 log( AT) log 2 (S 77 2 ) 


m 

JL^n-ln 

i=1 


+ IIIL; 


l/(2p) 


m 


1/2 


VEdJ+1. 


Above we applied Holder’s inequality and used that IIq + lip = 1 as well as p > 1. We now choose 
p = 1 + l/log(sp 2 ) and p = 1 + log(sp 2 ) to give (sp 2 ) tp_1)/(2p) < (sp 2 ) (p_1)/2 = (ST 7 2 ) 1 /( 2 logC-su 2 )) = e 1/2 anc j 


Ed, < C 4 y 2 log( IV) log 2 (sp 2 ) /m>/Ed, + l. 
Squaring this inequality and completing the square finally shows that 


Ed, < C 5 I 


' (sp 2 )log(A01og 3 (sp 2 ) 


m 


(4.11) 


provided the term under the square root is at most 1. Then Ed, < d/2 for some d e (0,1) if 


m>Ced 2 (sp 2 )log 3 (sp 2 )log(iV). 


(4.12) 


It remains to show that d, does not deviate much from its expectation. For this probability bound, 
we may write d, as the supremum of an empirical process as in [ IRaulOl , Theorem 6.25] and apply the 
following Bernstein inequality for the supremum of an empirical process: 


Theorem 4.3 (I Bou03l . Theorem 6.25 of I RaulO il). LetEP be a countable set offunctions f : C n — IR. Let 
be independent copies of a random vector Y on C n such thatE f(Y) = 0 for all f e 3P, and assume 
f{Y) < 1 almost surely. Let Z be the random variable Z = sup ^ f(Y(), andEZ its expectation. Let 

o 2 > 0 such thatE[f[Y) 2 ] < a 2 for all f e ZP. Then, for all f > 0, 


P(Z > EZ + t) < exp 


2 {mo 2 +2EZ) +2tl3) 


(4.13) 


We apply this theorem to provide a probability bound. Let f ZiW {r) = Re«(r* r* -1 )z, w)) so that 


md, = 


ZWr,- Er/r,) 
1 = 1 


m 


SU P 'Lfz.wird- 

(z,w)e@,j=i 


Since Er* r, = I we have E f ZtW [r) = 0. Moreover, \f z , w {r) \ < max ze @ Kr/,z>| 2 +1 < sq 2 + 1. For the variance 
term, we have E|/ Z|l „(r/)| 2 = E|| (r* r, -I)z||| < [sq 2 + l) 2 . 
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Fix 8 e (0,1), and suppose the number of measurements m satisfies 14.121 so that E5 S < 8/2. Then it 
follows 


P(<5 S ><5) < P(5,>E<5 s + h/9) 

m 


> E 


< exp 


E i r i r i _ Er*r,-) 
i=i 

i m 

7~T E (C r i - Er i r i) 

ST] Z +1 

( / 8ml9 i2 

' sri 2 + l 

2 m(l + -£- T ) + i(^j-) 

stp+1 3 sip + l 


Ztfn-E r* ri ) 

i=i 


+ 8ml 9 


1 


ST] Z + 1 


-E 


Elr'ri-Er'r,) 

i'=l 


8ml 9 


+ ■ 


ST] Z + 1 


< exp 


8 z m 


C 7 [sr] 2 )] 


(4.14) 


The last term is bounded by y e (0,1) if m> C8 2 (sry 2 )log(l/y). Together, we have 8 S < 8 with probability 
at least 1 - y if 

m > C%8~ 2 sri 2 max{log 3 (s77 2 )log(AT),log(l/y)}. 

This completes the proof. □ 


5. Conclusion 

We have introduced a coherence-based analysis of compressive sensing when the signal to be recov¬ 
ered is approximately sparse in a redundant dictionary. Whereas previous theory only allowed for un¬ 
structured random sensing measurements, our coherence-based analysis extends to structured sensing 
measurements such as subsampled uniformly bounded bases, bringing the theory closer to the setting 
of practical applications. We also extend the theory of variable density sampling to the dictionary setting, 
permitting some coherence between sensing measurements and sparsity dictionary. We further extend 
the analysis to allow for weighted sparse expansions. Still, several open questions remain. While we 
provided two concrete examples of dictionaries satisfying the bounded localization factor condition re¬ 
quired by our analysis - the oversampled DFT frame and redundant Haar wavelet frame - these bounds 
can almost certainly be extended to more general classes of dictionaries, and improved considerably in 
the case of the oversampled DFT frame. We have also left several open problems related to the full anal¬ 
ysis for variable density sampling in this setting, including the removal of a weighted noise assumption 
in the -analysis reconstruction method. Finally, we believe that the D-RIP assumption used through¬ 
out our analysis can be relaxed, and that a RIPless analysis [ CP12 I should be possible and permit non- 
uniform signal recovery bounds at a reduced number of measurements. It would also be useful to ex¬ 
tend our results to measurement matrices constructed in a deterministic fashion for those applications 
in which randomness is not admissable; of course this is a challenge even in the classical setting IDeV07 i. 
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